The replication files are the following:
----------------------------------------

For the analysis based on publication performance:

- pubdata.dta (saved in Stata v11.2, 32-bit windows version, http://stata.com/): the file that defines top performance in terms of publication output of the researchers. This file allows replicating the results in Table 2 of the paper, using pubanalysis.do. Note the file is not saved in ASCII-format since the usage of the file in the analysis (stset + stcox) relies on Stata's way of encoding dates. 

- pubdata.txt: ASCII version of pubdata.dta

- pubanalysis.do: the Stata script (version 11.2) that runs the duration analysis; it uses pubdata.dta as input. The do-file can be read with any ASCII editor.

- pubanalysis.log: log file containing the output of pubanalysis.do.



For the analysis based on citation performance:

- citdata.dta (saved in Stata v11.2, 32-bit windows version, http://stata.com/): the file that defines top performance in terms of citation output of the researchers. This file allows replicating the results of Table 3 (columns 3 & 4) of the paper. Note the file is not saved in ASCII-format since the usage of the file in the analysis (stset + stcox) relies on Stata's way of encoding dates.

- citdata.txt: ASCII version of citdata.dta

-  citanalysis.do: the Stata script (version 11.2) that runs the duration analysis; it uses citdata.dta as input. The do-file can be read with any ASCII editor.

- citanalysis.log: log file containing the output of citanalysis.do.



The two data files contain the following variables:
---------------------------------------------------
persnr		personal identifier
begin		start date of current period
end		end date of current period
finalobs	indicator for final observation of an individual
h2y		researcher's best performance in (y, y+1) is HIGH (rather than MEDIUM or LOW)
prefirst2y	observation <= 1st top performance
pubm_pre92h	Top performer prior to 1992
zapdate		date of entry in ZAP (=professorship)
male		male
age		age
maind_B		main discipline = biosciences
maind_C		main discipline = chemistry
maind_E		main discipline = engineering
maind_G		main discipline = geosciences
maind_H		main discipline = mathematics
maind_I		main discipline = medicine I
maind_M		main discipline = medicine II
maind_N		main discipline = neuroscience
maind_P		main discipline = physics
maind_R		main discipline = biomedical
maind_Z		main discipline = biology
fac_wet		member of fac of science
fac_twt		member of fac of engineering
fac_lbw		member of fac of agriculture
fac_gen		member of fac of medicine
fac_far		member of fac of pharmacy
r_doc1		rank 1 (t-1)
r_hdoc1		rank 2 (t-1)
r_hl1		rank 3 (t-1)
r_rest1		other rank (t-1)
yrsrank		seniority in rank
headuni1	head of unit (t-1)
zapaft92	entry as professor >= 1992
yrszap		career age
fulltiun	fulltime at university
tchload		teaching load
cumGOAlag1	cumulative count of type I funding (t-1)
cumOTlag1	cumulative count of type II funding (t-1)
avgco		average number of co-authors per article
avgcom		average nr of co-authors per pub * male
pasttop2y	nr of previous top performances
pasttop2ym	nr of previous top performances * male


The data files have been constructed through the combination and cleaning of several data sources:
- publication and citation data (Centre for R&D Monitoring (ECOOM), KU Leuven)
- personnel records (KU Leuven personnel department)
- research funding recors (KU Leuven Research Coordination Service)


Details on data cleaning and combination are available from the authors:
Stijn Kelchtermans
HU Brussel, Warmoesberg 26, 1000 BRUSSELS, Belgium
KU Leuven, Naamsestraat 69, 3000 LEUVEN, Belgium
stijn.kelchtermans@kuleuven.be

Reinhilde Veugelers
KU Leuven, Naamsestraat 69, 3000 LEUVEN, Belgium
reinhilde.veugelers@kuleuven.be